Parallel and Distributed Compressed Indexes
نویسندگان
چکیده
We study parallel and distributed compressed indexes. Compressed indexes are a new and functional way to index text strings. They exploit the compressibility of the text, so that their size is a function of the compressed text size. Moreover, they support a considerable amount of functions, more than many classical indexes. We make use of this extended functionality to obtain, in a shared-memory parallel machine, near-optimal speedups for solving several stringology problems. We also show how to distribute compressed indexes across several machines.
منابع مشابه
Massive-Scale RDF Processing Using Compressed Bitmap Indexes
The Resource Description Framework (RDF) is a popular data model for representing linked data sets arising from the web, as well as large scientific data repositories such as UniProt. RDF data intrinsically represents a labeled and directed multi-graph. SPARQL is a query language for RDF that expresses subgraph pattern-finding queries on this implicit multigraph in a SQLlike syntax. SPARQL quer...
متن کاملHCB-Tree: A Height Compressed B-Tree for Parallel Processing
B-tree type indexes are popular in database applications because they provide a fast access path to large databases. In this paper we present a new storage structure which is suitable for fast parallel searching by using B-tree like indexes [I]. We call this modified B_ tree structure the Height Compressed B_ tree (HCB_tree). The main results presented in this paper are that parallel processing...
متن کاملIndexes and Computation over Compressed Structured Data (Dagstuhl Seminar 13232)
Belief Change and Argumentation in Multi-Agent Scenarios (Dagstuhl Seminar 13231) Jürgen Dix, Sven Ove Hansson, Gabriele Kern-Isberner, and Guillermo Simari . . . 1 Indexes and Computation over Compressed Structured Data (Dagstuhl Seminar 13232) Sebastian Maneth and Gonzalo Navarro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Virtual Realities (Dagstuhl S...
متن کاملBetter bitmap performance with Roaring bitmaps
Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus we might prefer compressed bitmap indexes. Following Oracle’s lead, bitmaps are often compressed using run-length encoding (RLE). Building on prior work, we introduce the Roaring compressed bitmap format: it...
متن کاملPattern Kits
Compressed full-text indexes have been one of pattern matching’s most important success stories of the past decade. We can now store a text in nearly the information-theoretic minimum of space, such that we can still quickly count and locate occurrences of any given pattern. However, some files or collections of files are so huge that, even compressed, they do not all fit in one machine’s inter...
متن کامل